HSS8005
  • Module plan
  • Resources
  • Materials
  • Data
  • Assessment
  • Canvas

Week 3 Computer Lab Worksheet

  • Weekly materials

  • Week 1
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 2
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 3
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 4
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 5
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 6
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 7
    • Notes
    • Presentation
    • Worksheet
    • Handout
  • Week 8
    • Notes
    • Presentation
    • Worksheet
    • Handout

On this page

  • Aims
  • Setup
  • Import data
  • Basic interactions
  • Replicating Model 1
  • Exercise 1
  • Exercise 2

Week 3 Computer Lab Worksheet

Interactions and causal inference

Author

Chris Moreh

Aims

In this session we continue working with data from Österman (2021) to replicate parts their analysis. In Week 2 we practised building very basic bivariate and multiple linear regression models. In this session, we will expand on those models and attempt to replicate (some of) the models reported in Table 3 of Österman (2021):

Setup

In Week 1 you set up R and RStudio, and an RProject folder (we called it “HSS8005_labs”) with an .R script and a .qmd or .Rmd document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder. If it’s missing, complete Task 2 from the Week 1 Lab.

You can create a new .R script and .qmd/.Rmd for this week’s work (e.g. “Lab_3”). It may be useful to document more of your coding process and interpretations, so you want to work in a .qmd/.Rmd document rather than an .R script.

Import data

As a first step, let’s import the osterman dataset that underpins the Österman (2021) article (see the Data page on the course website for information about the datasets available in this course):

osterman <- sjlabelled::read_stata("https://cgmoreh.github.io/HSS8005-data/osterman.dta")

Basic interactions

To continue on from the toy models that we fitted in Week 2, we may remember that we observed a change in the size of the effect and statistical significance of the gender variable (female) when also accounting for years of education and age. The two models compared as so:

m_m <- lm(trustindex3 ~ eduyrs25 + agea + female, data = osterman)

m_b <- lm(trustindex3 ~ female, data = m_m$model)

sjPlot::tab_model(m_b, m_m)
  trustindex 3 trustindex 3
Predictors Estimates CI p Estimates CI p
(Intercept) 5.24 5.22 – 5.26 <0.001 2.95 2.87 – 3.04 <0.001
female 0.01 -0.02 – 0.04 0.582 0.04 0.01 – 0.07 0.004
eduyrs 25 0.12 0.11 – 0.12 <0.001
Age of
respondent,calculated
0.02 0.01 – 0.02 <0.001
Observations 68211 68211
R2 / R2 adjusted 0.000 / -0.000 0.063 / 0.063

The question arises whether the gender variable interacts with the other predictors, particularly with education as that is the main focus of the research. In other words, we may have reasons to assume that the linear effect of education may be different for males and females. We can interact predictor variables in regression models using the : and * operators. With :, we include in the regression model only the interaction term, but not the component variables (i.e. their main effects); to include the component variables as well, we need to add them in as usual with the + operator. To note that we should normally want to include both main effects and and the interaction effects in a regression model. With *, we include both the main effects and the interaction effects, so it provides a shortcut to using :, but it is often useful to combine the two in more complex interaction scenarios.

In the code below, we interact gender with both education and age, and the two commands have the same result:

mxa <- lm(trustindex3 ~ eduyrs25 + agea +female + female*eduyrs25 + female*agea, data = osterman)

mxb <- lm(trustindex3 ~ female*(eduyrs25 + agea), data = osterman)

## Check that they are equivalent

sjPlot::tab_model(mxa, mxb)
  trustindex 3 trustindex 3
Predictors Estimates CI p Estimates CI p
(Intercept) 3.25 3.13 – 3.36 <0.001 3.25 3.13 – 3.36 <0.001
eduyrs 25 0.10 0.10 – 0.11 <0.001 0.10 0.10 – 0.11 <0.001
Age of
respondent,calculated
0.01 0.01 – 0.02 <0.001 0.01 0.01 – 0.02 <0.001
female -0.53 -0.69 – -0.36 <0.001 -0.53 -0.69 – -0.36 <0.001
eduyrs25:female 0.03 0.02 – 0.03 <0.001
agea:female 0.00 0.00 – 0.01 <0.001
female:eduyrs25 0.03 0.02 – 0.03 <0.001
female:agea 0.00 0.00 – 0.01 <0.001
Observations 68211 68211
R2 / R2 adjusted 0.064 / 0.064 0.064 / 0.064

We find that both interaction terms (eduyrs25:female and agea:female) are statistically significant, which can tell us that there is an interaction that may be useful to include in the model. The greatest challenge is in interpreting results from interaction models, as the usual interpretation of the main effect terms no longer stands and the focus should be on the interaction terms instead. The best approach is to visualise the interactions using a graph, and there are various user-written functions that make this task easy. Just to keep it simple, we will use the plot_model() function from the {sjPlot} package.

First, we can check the interaction between education and gender:

sjPlot::plot_model(mxa, type = "pred", terms = c("eduyrs25", "female"))

This visualization in now easier to interpret than the numerical output. We see a steeper line for females than for males and the lines cross each other at around 12 years of completed schooling. Years spent in education therefore have a stronger influence on social trust for females than for males after around 12 years (which is generally the length of compulsory education in most European countries from which the data is drawn). Among those with education below this level, we find that males have higher levels of social trust, but longer educational careers have a stronger impact for women.

We can check the same for the interaction of gender and age:

sjPlot::plot_model(mxa, type = "pred", terms = c("agea", "female"))

The general pattern is the same, however, the overlapping confidence intervals up until about the age of 53 tell us that the observed difference between the two genders may be due to random variation in our sample. The effect of this interaction is weaker and less reliable than the one with education, and unless we have a strong theoretically motivated reason to include it in our model, we may want to keep a simpler model without this interaction effect.

Replicating Model 1

With this basic understanding of interaction terms, we can now attempt to replicate Model 1 from the Osterman article. Based on the description provided in the article and the appended materials we can reconstruct the main components of their model as shown below. As a first step, run the command and compare our results to those presented in Model 1. Bear in mind that the effects most coefficients are not presented in the main table (see also the table’s footnotes and Appendix 4 in the Supplementary Materials). We also did not include any survey weights in our model and did not include any error corrections, which the author does; so the results will not be exactly the same.

m1 <- lm(trustindex3 ~ reform1_7 + female + blgetmg_d + fbrneur + mbrneur + fnotbrneur + mnotbrneur + agea  + agea + factor(essround) + yrbrn + yrbrn:yrbrn + yrbrn:factor(reform_id_num) + agea:factor(reform_id_num) + agea:agea + agea:agea:factor(reform_id_num) +  factor(reform_id_num), data = osterman)

sjPlot::tab_model(m1)
  trustindex 3
Predictors Estimates CI p
(Intercept) 55.15 -27.45 – 137.76 0.191
reform 1 7 0.06 0.01 – 0.12 0.020
female 0.06 0.03 – 0.08 <0.001
blgetmg d -0.24 -0.35 – -0.13 <0.001
Foreign-born father,born
in Europe
-0.11 -0.19 – -0.03 0.009
Foreign-born mother,born
in Europe
-0.11 -0.19 – -0.02 0.015
fnotbrneur -0.06 -0.21 – 0.08 0.382
mnotbrneur -0.11 -0.26 – 0.04 0.152
Age of
respondent,calculated
-0.02 -0.06 – 0.02 0.354
factor(essround)2 0.06 -0.03 – 0.15 0.175
factor(essround)3 0.16 0.02 – 0.31 0.029
factor(essround)4 0.25 0.04 – 0.46 0.020
factor(essround)5 0.37 0.09 – 0.65 0.009
factor(essround)6 0.42 0.07 – 0.77 0.019
factor(essround)7 0.47 0.06 – 0.89 0.025
factor(essround)8 0.69 0.21 – 1.17 0.005
factor(essround)9 0.84 0.29 – 1.40 0.003
Year of birth -0.03 -0.07 – 0.02 0.233
factor(reform_id_num)2 25.71 -31.48 – 82.90 0.378
factor(reform_id_num)3 45.08 -13.98 – 104.15 0.135
factor(reform_id_num)4 -16.50 -67.09 – 34.09 0.523
factor(reform_id_num)5 50.81 -16.81 – 118.43 0.141
factor(reform_id_num)6 -10.79 -78.90 – 57.33 0.756
factor(reform_id_num)7 40.38 -13.19 – 93.96 0.140
factor(reform_id_num)8 59.01 5.85 – 112.17 0.030
factor(reform_id_num)9 30.98 -29.20 – 91.16 0.313
factor(reform_id_num)10 35.89 -16.04 – 87.82 0.176
factor(reform_id_num)11 26.89 -24.05 – 77.82 0.301
factor(reform_id_num)12 24.03 -33.65 – 81.71 0.414
factor(reform_id_num)13 41.87 -10.16 – 93.91 0.115
factor(reform_id_num)14 -72.42 -140.69 – -4.15 0.038
factor(reform_id_num)15 -4.37 -55.16 – 46.41 0.866
factor(reform_id_num)16 105.35 56.54 – 154.16 <0.001
factor(reform_id_num)17 15.07 -44.38 – 74.51 0.619
factor(reform_id_num)18 20.58 -41.72 – 82.87 0.517
factor(reform_id_num)19 18.69 -57.64 – 95.02 0.631
factor(reform_id_num)20 106.91 -30.96 – 244.77 0.129
factor(reform_id_num)21 0.48 -48.64 – 49.61 0.985
factor(reform_id_num)22 2.49 -47.48 – 52.47 0.922
factor(reform_id_num)23 5.58 -56.32 – 67.48 0.860
factor(reform_id_num)24 52.34 -14.49 – 119.17 0.125
factor(reform_id_num)25 64.69 -3.71 – 133.09 0.064
factor(reform_id_num)26 48.18 -12.09 – 108.44 0.117
factor(reform_id_num)27 -3.59 -56.72 – 49.54 0.895
yrbrn:factor(reform_id_num)2 -0.01 -0.04 – 0.02 0.393
yrbrn:factor(reform_id_num)3 -0.02 -0.05 – 0.01 0.135
yrbrn:factor(reform_id_num)4 0.01 -0.02 – 0.03 0.532
yrbrn:factor(reform_id_num)5 -0.02 -0.06 – 0.01 0.161
yrbrn:factor(reform_id_num)6 0.01 -0.03 – 0.04 0.699
yrbrn:factor(reform_id_num)7 -0.02 -0.05 – 0.01 0.144
yrbrn:factor(reform_id_num)8 -0.03 -0.06 – -0.00 0.032
yrbrn:factor(reform_id_num)9 -0.01 -0.05 – 0.02 0.336
yrbrn:factor(reform_id_num)10 -0.02 -0.04 – 0.01 0.181
yrbrn:factor(reform_id_num)11 -0.01 -0.04 – 0.01 0.293
yrbrn:factor(reform_id_num)12 -0.01 -0.04 – 0.02 0.390
yrbrn:factor(reform_id_num)13 -0.02 -0.05 – 0.01 0.116
yrbrn:factor(reform_id_num)14 0.04 0.00 – 0.07 0.040
yrbrn:factor(reform_id_num)15 0.00 -0.02 – 0.03 0.888
yrbrn:factor(reform_id_num)16 -0.05 -0.08 – -0.03 <0.001
yrbrn:factor(reform_id_num)17 -0.01 -0.04 – 0.02 0.621
yrbrn:factor(reform_id_num)18 -0.01 -0.04 – 0.02 0.526
yrbrn:factor(reform_id_num)19 -0.01 -0.05 – 0.03 0.645
yrbrn:factor(reform_id_num)20 -0.05 -0.12 – 0.02 0.130
yrbrn:factor(reform_id_num)21 0.00 -0.02 – 0.02 0.993
yrbrn:factor(reform_id_num)22 -0.00 -0.03 – 0.02 0.869
yrbrn:factor(reform_id_num)23 -0.00 -0.03 – 0.03 0.870
yrbrn:factor(reform_id_num)24 -0.03 -0.06 – 0.01 0.118
yrbrn:factor(reform_id_num)25 -0.03 -0.07 – 0.00 0.062
yrbrn:factor(reform_id_num)26 -0.02 -0.05 – 0.01 0.117
yrbrn:factor(reform_id_num)27 0.00 -0.02 – 0.03 0.853
agea:factor(reform_id_num)2 -0.02 -0.04 – -0.00 0.019
agea:factor(reform_id_num)3 -0.01 -0.03 – 0.01 0.310
agea:factor(reform_id_num)4 0.01 -0.01 – 0.03 0.240
agea:factor(reform_id_num)5 -0.02 -0.05 – -0.00 0.028
agea:factor(reform_id_num)6 -0.02 -0.04 – 0.01 0.174
agea:factor(reform_id_num)7 -0.02 -0.04 – -0.00 0.018
agea:factor(reform_id_num)8 -0.03 -0.05 – -0.02 <0.001
agea:factor(reform_id_num)9 -0.01 -0.03 – 0.01 0.205
agea:factor(reform_id_num)10 -0.02 -0.04 – -0.01 0.006
agea:factor(reform_id_num)11 -0.00 -0.02 – 0.01 0.569
agea:factor(reform_id_num)12 0.02 -0.01 – 0.04 0.163
agea:factor(reform_id_num)13 -0.01 -0.02 – 0.01 0.509
agea:factor(reform_id_num)14 0.01 -0.02 – 0.03 0.673
agea:factor(reform_id_num)15 -0.00 -0.02 – 0.01 0.561
agea:factor(reform_id_num)16 -0.04 -0.06 – -0.03 <0.001
agea:factor(reform_id_num)17 0.01 -0.01 – 0.03 0.349
agea:factor(reform_id_num)18 -0.03 -0.05 – -0.01 0.004
agea:factor(reform_id_num)19 -0.00 -0.02 – 0.02 0.894
agea:factor(reform_id_num)20 -0.02 -0.09 – 0.05 0.592
agea:factor(reform_id_num)21 0.00 -0.02 – 0.02 0.994
agea:factor(reform_id_num)22 0.00 -0.01 – 0.02 0.537
agea:factor(reform_id_num)23 -0.04 -0.07 – -0.02 0.001
agea:factor(reform_id_num)24 -0.02 -0.04 – 0.01 0.145
agea:factor(reform_id_num)25 -0.02 -0.04 – -0.00 0.026
agea:factor(reform_id_num)26 -0.03 -0.05 – -0.00 0.018
agea:factor(reform_id_num)27 -0.01 -0.02 – 0.01 0.571
Observations 68796
R2 / R2 adjusted 0.199 / 0.198

Exercise 1

This models provides a lot to play around with. For this exercise, start from this complex model and start simplifying it in various ways. Try excluding some of the interaction terms and check the change in the results. Plot the interaction terms and attempt an interpretation. Note down your thoughts and models. Playing around with this model is a good way to learn about the effect of interactions.

Exercise 2

Read through the Osterman article and check your understanding of how the author interprets these results. Particularly, how does the interaction model help us elucidate causal factors in the regression model?

References

David, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.
El-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.
Lord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.
Mulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.
Österman, Marcus. 2021. “Can We Trust Education for Fostering Trust? Quasi-experimental Evidence on the Effect of Education and Tracking on Social Trust.” Social Indicators Research 154 (1): 211–33. https://doi.org/10.1007/s11205-020-02529-y.
Senn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489.
Presentation
Handout